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1  Introduction 

This  report  is  focused  on  a  basic  (6.1)  level  research  on  full  motion  video  analysis  with  the  objective  of  investigating 
the  process  of  motion  blurring  in  planar  scenes  and  to  develop  algorithms  for  blind  restoration  of  blurred  images 
together  with  an  estimate  of  camera  motion.  This  is  a  topic  of  immense  importance  to  the  U.S.  Air  Force  -  with  a 
potential  impact  on  image  analysis,  characterization  and  exploitation.  The  amount  of  full  motion  video  clips  that  we 
process  has  grown  exponentially.  These  images  are  typically  acquired  for  surveillance  purpose,  collected  persistently 
over  a  fixed  field  of  view,  albeit  with  varying  degrees  of  relative  motion  between  the  camera  and  objects  within 
the  scene.  Motion  blur  results  when  there  is  relative  motion  between  camera  and  scene.  It  has  acquired  special 
significance  with  hand-held  imaging,  aerial  imaging,  and  imaging  ‘on  the  move’  shooting  into  prominence.  It  is  also 
relevant  to  situations  where  the  camera  is  still  but  the  scene  comprises  of  several  moving  objects.  The  problem  is  of 
great  relevance  within  the  overall  context  of  aerial  surveillance.  Images  captured  from  an  aerial  platform  are  normally 
affected  by  motion  blur  due  to  instabilities  of  the  moving  platform.  Smear  caused  by  motion  blur  can  severely  dent 
the  utility  of  such  data. 

The  ubiquity  of  imaging  devices  and  the  fact  that  practical  cameras  are  real-aperture  results  in  the  natural  preva¬ 
lence  of  motion/optical  blur  in  images  [1].  The  problem  of  image  blurring  has  been  around  for  many  years,  the  major 
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thrust  having  been  on  optical  defocusing  that  occurs  when  the  lens  settings  relative  to  a  scene  fail  to  satisfy  Gauss  law. 
Depending  on  the  nature  of  the  scene,  the  resultant  image  can  be  either  space-invariantly  or  space-variantly  blurred. 
Several  works  already  exist  that  deal  with  optical  blur  in  a  comprehensive  manner.  Interestingly,  recent  times  have 
seen  the  resurgence  of  motion  blur  as  an  area  of  great  relevance.  In  contrast  to  defocus,  the  occurrence  of  motion 
blur  is  significantly  higher  in  practical  scenarios.  Motion  blur  can  be  avoided  provided  the  camera  is  placed  on  a  firm 
support.  However,  carrying  sturdy  accessories  can  be  cumbersome  and  sometimes  prohibitive.  Even  if  one  were  to 
use  auxiliary  sensors,  blur  cannot  be  completely  avoided  [2]. 

Unlike  the  optical  blur,  motion-blur  can  be  space-varying  even  for  planar  scenes  and  is  typically  so  since  camera 
motion  invariably  involves  rotations.  An  example  case  is  that  of  a  rotating  camera  imaging  a  distant  scene.  In  fact, 
the  shape  of  the  blur  kernel  is  a  function  of  the  scene  as  well  as  camera  motion.  For  planar  scenes,  the  shape  of 
the  blur  kernel  is  a  function  of  camera  motion  while  the  weights  of  the  kernel  can  be  related  to  the  exposure  time 
corresponding  to  the  set  of  geometric  transformations  that  the  camera  traversed  along  its  motion  trajectory.  Unlike 
the  optical  blur,  the  motion  blur  is  a  function  of  both  camera  motion  and  scene  depth,  has  no  a  priori  shape,  and  is 
typically  sparse.  These  characteristics  make  the  motion  blur  unique  in  several  respects  as  compared  to  the  traditionally 
well-understood  optical  blur. 

Motion  blurring  is  both  a  bane  and  a  boon.  Most  works  treat  motion  blur  as  nuisance  and  seek  ways  and  means 
to  mitigate  its  effects  so  as  to  restore  the  original  image.  However,  it  must  be  emphasized  that  motion  blur  can 
also  serve  as  a  vital  cue  for  camera  motion  estimation,  depth  recovery,  super-resolution,  image  forensics,  etc.  Blind 
motion  deblurring  is  both  interesting  and  a  technically  challenging  problem.  The  aim  of  this  project  was  to  study, 
understand,  investigate,  and  propose  algorithms  to  meet  our  objectives  primarily  from  a  surveillance  (not  necessarily 
aerial)  perspective.  The  key  challenge  lay  in  handling  the  complexities  (including  loss  of  resolution)  that  arise  from 
space- varying  local  blurring  due  to  incidental  camera  motion. 

We  started  joint  work  with  the  AFRL  team  led  by  Dr.  Guna  Seetharaman  in  Sept.  2012.  Our  focus  was  on 
restoration,  registration,  dehazing,  and  super-resolution  of  motion  blurred  images  for  planar  scenes.  There  were 
frequent  information  exchanges  between  the  P.I.  and  the  AFRF  collaborator,  at  least  once  every  quarter.  The  analytical 
work  focused  on  arriving  at  global  explanations  for  the  blurring  process  by  modeling  the  motion-blurred  image  as 
an  average  of  projectively  transformed  images.  Initially,  we  address  restoration  of  space-variantly  motion-blurred 
images.  This  was  followed  by  a  method  for  registering  blurred  images  to  reveal  change  detection,  if  any.  We  then 
discuss  how  to  restore  foggy  motion-blurred  images  using  depth  cues  derived  from  fog  itself.  Finally,  for  purpose 
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of  image  super-resolution,  the  deblurring  framework  was  extended  to  handle  down-sampling  effects  too.  For  each  of 
these  tasks,  once  the  analytical  formulations  were  in  place,  the  simulation  cum  validation  phase  followed  with  tests 
involving  real  data  sets.  Experimental  verification  of  the  algorithms,  including  computer  simulations,  was  done  to 
refine  both  the  theoretical  analysis  as  well  as  the  numerical  implementation.  While  some  of  the  results  arising  out  of 
these  efforts  have  already  been  published  in  prestigious  avenues,  others  are  under  review. 


2  Restoration  of  motion  blur 


During  the  process  of  capturing  an  image,  the  camera  shutter  opens  and  closes  to  let  light  from  the  scene  to  fall  on 
imaging  sensors.  The  shutter  interval  denotes  the  amount  of  time  during  which  the  camera  sensors  observe  the  scene. 
The  final  image  obtained  from  the  camera  is  a  function  of  the  total  light  energy  accumulated  by  the  sensors.  When 
the  shutter  is  open,  if  there  is  a  motion  of  either  the  camera  or  the  scene,  a  particular  pixel  in  the  image  plane  receives 
light  intensities  from  more  than  one  point  in  the  scene  resulting  in  an  averaging  effect  called  as  motion  blur  [1,  3,  4]. 

Let  /  denote  the  image  captured  from  a  camera  when  there  is  no  relative  motion  during  exposure  (shutter  interval), 
i.e.,  /  represents  the  unblurred  latent  image  of  the  scene.  Let  g  denote  a  blurred  image  captured  when  there  is  a  relative 
motion  between  the  camera  and  the  scene.  The  intensity  value  at  a  pixel  of  g  will  be  an  average  of  image  intensities 
at  different  pixels  of  /  i.e., 


1  f  e  _ 

g(x,y)  =  —  /  f  (x  -  x(x,y,r),y  -  y(x,y,r))  dr 

JeJ  0 


(1) 


where  Te  is  the  exposure  time,  and  x(x,  y,  r)  and  y(x,  y,  r)  denote  the  components  of  the  apparent  displacement  of  a 
point  ( x ,  y)  in  /  at  time  instant  r.  In  the  special  case  where  the  displacement  is  the  same  for  all  the  image  points,  we 
have  x(x,  y,  r)  =  x0(r),  y(x,  y,  r)  =  y0(r)  and 


1  fTe 

g(x,  y)  =  —  f(x-  xo(r),y  -  y0(r))  dr 

1e  JO 


(2) 


All  the  image  points  undergo  the  same  displacement  only  when  the  relative  motion  is  restricted  to  in-plane  translations 
and  when  there  is  no  effect  of  parallax  (i.e.,  scene  is  a  fronto-parallel  plane).  In  this  scenario,  motion  blur  can  be 
modeled  as  a  convolution  of  the  original  image  with  a  point  spread  function  (PSL)  which  is  also  referred  to  as  a  blur 
kernel  [3,  5].  i.e., 


g(x,y)  =  /  *  h(x,y) 


/( x  —  s,y  —  t)  h(s,  t)  ds  dt 


(3) 
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The  PSF  h  is  given  by 


1  fTe 

h(s,t)  =  —  S  (s  —  xo(r),t  —  yo(r))  dr 

1e  J  0 


(4) 


where  5  indicates  the  2D  Dirac-delta  function.  An  interesting  interpretation  of  equation  (3)  is  that  the  blurred  image 
g  is  a  weighted  average  of  translated  instances  of  the  original  image  /.  The  PSF  h  denotes  the  weights  corresponding 
to  every  possible  translational  shift  of  the  camera.  The  value  of  the  weight  at  a  position  is  equal  to  the  fraction  of  the 
exposure  duration  spent  by  the  camera  in  that  shifted  position.  In  the  special  case  where  the  camera  velocity  is  uniform 
and  along  a  particular  direction,  the  PSF  will  have  uniform  weights  only  along  a  straight  line  whose  angle  denotes  the 
direction  of  motion.  Note  that  the  PSF  cannot  be  negative.  Since  the  relative  motion  is  arbitrary,  we  cannot  represent 
a  motion  blur  kernel  in  terms  of  a  predetermined  functional  form.  Also,  motion  PSF  is  typically  sparse  because  the 
displacements  undergone  by  a  scene-point  form  only  a  small  subset  of  all  possible  shifts.  If  the  exposure  duration  is 
the  same  for  the  original  image  as  well  as  the  blurred  image,  the  magnitude  of  light  energy  accumulated  will  be  the 
same  for  both  the  observations.  Hence,  by  principle  of  energy  conservation,  the  PSF  integrates  to  unity. 

In  Fig.  1,  we  show  the  effect  of  space-invariant  motion  blurring.  The  original  image  of  a  scene  was  captured  from 
a  still  camera  as  shown  in  Fig.  1  (a).  In  Fig.  1  (c),  we  show  the  motion  blurred  image  obtained  when  there  was  a 
camera  shake  during  exposure.  The  blur  kernel  given  in  Fig.  1  (b)  depicts  the  direction  of  camera  translation.  Since 
the  camera  motion  was  predominantly  along  vertical  direction,  we  observe  smearing  effect  along  that  direction  in  Fig. 
1(c). 
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2.1  Motion  blur  and  TSF  model 


When  the  camera  motion  is  not  restricted  to  in-plane  translations,  the  apparent  motion  of  scene  points  in  the  image 
will  vary  at  different  locations  resulting  in  space-variant  blurring.  The  convolution  model  with  a  single  blur  kernel 
does  not  hold  in  such  a  scenario.  However,  when  the  scene  depth  is  constant,  the  blurred  image  can  be  accurately 
modeled  as  a  weighted  average  of  the  warped  instances  of  the  original  image  [6,  7,  8,  9]. 

Let  the  image  of  a  scene  captured  by  a  still  camera  be  denoted  by  /  :  M2  M.  Let  X  =  [X  Y  Z}T  denote  the 
spatial  coordinates  of  a  point  in  the  scene  with  the  camera  center  as  the  origin.  The  projection  of  X  in  the  image  plane 
(x,  y )  is  given  by  x  =  ^  and  y  =  ^  where  v  denotes  the  focal  length.  Using  homogeneous  coordinates,  the  image 
point  x  =  [x  y  1]T  can  be  written  as  i^X.  In  this  discussion,  Kv  is  assumed  to  be  of  the  form 


Kv 


v  0  0 
0  v  0 
0  0  1 


(5) 


Due  to  camera  motion  during  image  capture,  at  each  instant  of  time  r,  the  coordinates  of  the  3D  point  X  changes 
to  Xr  =  i?rX  +  Tt  with  respect  to  the  camera  where  Tr  =  \Txt  Tyr  Tzt]T  is  the  translation  vector.  The  rotation 
matrix  Rr  is  parameterized  [10]  in  terms  of  Ox ,  Oy  and  Oz  (the  angles  of  rotation  about  the  three  axes)  using  the 
matrix  exponential 


0 

~0zT 

0yt 

Rr  =  e0T  where  ©r  = 

9zr 

0 

-0xT 

(6) 

~0yT 

0xT 

0 

We  consider  that  all  of  the  scene  points  are  at  a  depth  dQ  from  the  camera.  Consequently,  the  point  xr  at  which  Xr 
gets  projected  in  the  camera  can  be  obtained  through  a  homography  Hr  as  xr  =  Hrx  where 


Hr  =  Kv 


1 

dQ 


Tr  [0  0  1] 


K 


-1 


(7) 


Let  gT  denote  the  image  captured  at  time  instant  r.  For  the  sake  of  simplicity,  we  use  the  same  notation  (x)  for  the 
homogeneous  coordinates  as  well  as  for  the  coordinates  in  the  image  plane.  Then  we  can  write  gT  (x)  =  /  (iJ“1x) 
where  77“ 1  denotes  the  inverse  of  Hr  (since  gT  ( Hrx )  =  /  (x)).  The  blurred  image  g  can  be  considered  as  the 
average  of  the  light  intensities  observed  in  the  image  plane  during  exposure.  The  blurred  image  intensity  at  an  image 
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point  x  is  given  by 


Te 

<J  (x)  =  Y  J  f  (Hrlx)  dT  (8) 

0 

where  Te  is  the  total  exposure  duration. 

Note  that,  when  averaging  over  time,  the  temporal  information  (order  of  the  set  of  transformations  undergone  by 
the  reference  image)  is  lost.  But  this  is  a  non-issue  for  the  problem  on  hand.  The  blurred  image  can  be  more  appropri¬ 
ately  modeled  in  terms  of  the  reference  image,  by  averaging  it  over  the  set  of  possible  transformations  (resulting  from 
the  camera  motion).  Let  T  denote  the  set  of  all  possible  transformations  and  T  denote  a  transformation.  We  define 
the  transformation  spread  function  (TSF)  cj  :  as  a  mapping  from  the  set  T  to  non-negative  real  numbers.  For 

each  transformation  T gT,  the  value  of  the  TSF  u  (T)  denotes  the  fraction  of  the  total  exposure  duration  for  which 
the  camera  was  in  the  position  that  caused  the  homography  H f*1  on  the  image  coordinates.  It  is  to  be  noted  that  the 
term  T  denotes  the  transformation  parameters  corresponding  to  the  homography  matrix  H^1,  and  does  not  indicate 
a  time  instant.  The  blurred  image  can  be  written  as  an  average  of  the  warped  images  weighted  by  the  TSF  u.  i.e., 

<?(x)=  J  u  (T)  /  (H^1  (x))  dT  (9) 

TgT 

When  the  camera  motion  is  not  restricted,  the  paths  traced  by  scene  points  in  the  image  plane  can  vary  across 
the  image  resulting  in  space-variant  blur.  However,  the  blurring  operation  can  be  described  by  a  single  TSF  using 
equation  (9).  The  TSF  depicts  the  camera  motion  during  exposure.  For  instance,  if  the  camera  undergoes  only  in¬ 
plane  rotations,  the  TSF  will  have  non-zero  weights  only  for  the  rotational  transformations.  Analogous  to  a  blur 

kernel,  the  TSF  satisfies  the  relation  f  cj  (T)  =  1  (assuming  equal  amount  of  light  energy  is  involved  in  the 

Ter 

formation  of  /  and  g ). 

Although,  the  TSF  model  is  more  efficient  than  the  PSF  model  in  representing  space-variant  motion  blur,  it  is 
useful  to  relate  the  two  models.  The  blurred  image  g  can  be  modeled  with  a  space-variant  PSF  h  as 

g  (x)  =  /  *v  h  (x)  =  J  f  (x  -  u)  h  (x  -  u,  u)  du  (10) 

where  h  (x,  u)  denotes  the  blur  kernel  at  the  image  point  x  as  a  function  of  the  independent  variable  u.  The  PSF 
h  (x,  u)  represents  the  displacements  undergone  by  a  point  light  source  at  x  during  the  exposure  and  is  weighted  ac¬ 
cording  to  the  fraction  of  the  exposure  time  the  light  source  stays  at  the  displaced  position.  Following  our  discussions 
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in  section  2,  the  PSF  can  be  written  as 


-L  e 

h  (x,  u)  =  2.  J  S  (u 


—  xr)  dr 


(11) 


where  5  indicates  the  2D  Dirac  Delta  function  and  xr  is  the  instantaneous  displacement.  It  is  to  be  noted  that  for 
brevity,  we  use  the  vectors  x  and  u  to  indicate  locations.  The  PSF  h  at  an  image  point  x  can  be  obtained  from  the  TSF 
u)  by  finding  the  displacement  induced  due  to  each  of  the  possible  transformations.  This  relationship  can  be  written  as 


h 


(x,  u)  =  J  uj  (T)  S  (u  —  (H-) fx  —  x))  dY 


(12) 


T  eT 


Consider  the  scene  whose  latent  image  is  shown  in  Fig.  2  (a).  There  were  no  depth  variations  in  this  scene. 
However,  the  camera  underwent  in-plane  rotational  motion  during  image  capture  leading  to  the  non-uniformly  blurred 
image  shown  in  Fig.  2  (b).  We  note  that  the  region  around  the  image  center  does  not  appear  blurred  while  the  effect 
of  blurring  increases  as  we  move  away  from  the  center. 


rV  \  V  »-  -  V 


^  >5^  V*  ’  TV 
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(a)  (b) 

Figure  2:  (a)  Unblurred  image,  (b)  Space- variantly  blurred  observation. 


2.2  Related  works 

The  problem  of  inferring  an  image  in  the  presence  of  blur  (which  corresponds  to  image  deblurring)  has  been  widely 
studied  in  the  literature  [4,  7,  11,  12,  13].  Traditional  approaches  assume  blur  to  remain  constant  across  all  the  image 
points.  This  will  be  the  case  when  the  camera  motion  is  restricted  to  in-plane  translations  and  the  scene  is  of  constant 
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depth.  A  lot  of  approaches  that  address  the  problem  of  blind  motion  deblurring  exist  in  the  literature  for  the  case 
of  uniform  blur  [4,  14,  15,  16].  A  comparison  of  recent  deconvolution  techniques  for  space-invariant  blur  can  be 
found  in  [13].  In  practical  scenarios,  blurring  due  to  camera  shake  is  space-variant  in  nature  and  the  convolution 
model  does  not  hold  [17].  [18]  have  proposed  a  non-blind  deblurring  scheme  which  estimates  the  latent  image  even 
if  the  blur  kernels  are  erroneous  and  demonstrated  its  applicability  for  space-variant  blur.  Based  on  this  work,  in  [19], 
they  address  blind  space-variant  deblurring  by  estimating  the  blur  kernel  at  each  pixel.  Techniques  exist  that  address 
restoration  of  non-uniform  blur  by  local  space-invariance  approximation  [20,  21].  Recent  techniques  avoid  such  an 
approximation  by  modeling  the  motion-blurred  image  as  an  average  of  projectively  transformed  images  [6,  7,  8,  9].  In 
this  approach,  blur  is  modeled  by  considering  the  transformations  undergone  by  the  image  plane  rather  than  using  a 
point  spread  function  which  varies  at  every  pixel.  [6]  have  proposed  a  deblurring  scheme  for  the  projective  blur  model 
based  on  Richardson  Lucy  deconvolution.  However,  they  do  not  address  the  problem  of  determining  the  blurring 
function.  [7,  10]  propose  an  image  restoration  technique  for  non-uniform  motion-blur  arising  due  to  camera  rotations. 
They  represent  the  blurring  function  on  a  3D  grid  corresponding  to  the  three  directions  of  camera  rotations.  For  the 
case  of  blind  image  restoration,  the  kernel  estimation  framework  is  employed  in  [4].  When  a  noisy  version  of  the 
original  image  is  available,  a  least-squares  energy  minimization  approach  is  used  for  finding  the  blurring  function. 
[9]  have  proposed  another  blind  deblurring  scheme  wherein  the  camera  motion  is  considered  to  be  comprised  of  2D 
translations  and  in-plane  rotation.  Their  technique  is  based  on  the  method  in  [14].  In  the  restoration  technique  by 
[22],  sensors  are  attached  to  the  camera  to  determine  the  blurring  function.  [23]  propose  a  deblurring  scheme  that  uses 
coded  exposure  and  some  simple  user  interactions  to  determine  the  space- variant  PSF.  [24]  propose  a  new  approach 
to  restore  non-uniform  motion  blur  by  using  the  efficient  filter  flow  framework  [25].  In  [26],  a  new  regularization 
scheme  that  compensates  attenuation  of  high  frequencies  is  used  to  perform  blind  deblurring.  While  [27]  uses  two 
non-uniformly  blurred  observations  and  iteratively  solve  for  the  latent  image  and  its  transformations,  in  [28],  the 
authors  propose  to  iteratively  estimate  the  latent  image  and  the  blur  from  a  single  image. 


2.3  Blind  image  restoration 


Our  algorithm  proceeds  by  updating  the  estimate  of  the  camera  pose  at  one  step,  and  the  latent  image  in  the  next.  To 
this  end,  we  minimize  the  following  energy  function.  Let  uj  —  hr  and  l  —  /.  The  cost  to  be  minimized  is 

2 


E{  w,l)  = 


5>(fc)-l(Hka 

\k£  T 


-b 


+  <E>i(l)  +  <E>2(w) 


(13) 
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where  w  denotes  the  vector  of  weights  u(k),  4>i  and  $2  represent  the  regularization  terms  on  the  latent  image  1  and 
the  weights  w,  respectively.  The  regularization  terms  will  be  explained  in  the  following  sub-sections. 

2.3.1  TSF  estimation 

In  the  TSF  estimation  step,  we  compute  w  given  the  current  estimate  of  the  latent  image  1  based  on  equation  13.  Our 
algorithm  requires  the  user  to  specify  a  rough  guess  of  the  extent  of  the  blur  (translation  in  pixels  along  x,y  axes  and 
rotation  in  degrees  along  z  axis)  to  build  the  initial  TSF.  The  three-dimensional  camera  pose  subspace,  whose  limits 
are  specified  by  the  user,  is  uniformly  sampled  to  build  the  first  ‘active’  set  of  camera  poses.  We  denote  this  active 
set  by  A  where  A  C  T.  (In  our  experiments,  the  active  set  contained  200  poses  which  is  still  much  smaller  than  the 
1500-2000  poses  that  the  whole  space  T  would  contain  even  for  small  to  moderate  blurs.)  Our  algorithm  requires  no 
other  user  input. 


Figure  3:  Overview  of  the  proposed  deblurring  algorithm. 


In  the  first  iteration,  we  optimize  over  the  initial  TSF  (obtained  in  2.3.1)  by  minimizing  the  following  energy 
function 


2 

+  $2(w)  (14) 

2 

where  <f>2(w)  =  /3||w||i,  a  sparse  prior.  Since  image  derivatives  have  been  shown  to  be  effective  for  reducing 
ringing  effects  [14],  we  work  on  gradients  instead  of  image  intensities  in  our  implementation  of  equation  14.  This 
optimization  problem  can  be  solved  using  the  nnLeastR  function  of  the  Lasso  algorithm  [29]  which  considers  the 
additional  11  —  norm  constraint  and  imposes  non-negativity  on  the  TSF  weights.  Only  the  ‘dominant’  poses  in  the 
active  set  A  are  selected  as  a  result  of  the  sparsity  constraint  imposed  by  the  11  —  norm  and  the  remaining  poses 
which  are  outliers  are  removed.  We  now  rebuild  the  active  set  for  the  next  iteration  so  that  its  cardinality  is  the  same 
as  the  A  of  the  current  iteration.  The  new  poses  are  selected,  in  a  manner  similar  to  [28],  by  sampling  based  on  the 


E{  w)  = 


u{k)  -l(Hkx)  -b 
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current  A  using  a  Gaussian  distribution.  In  the  next  iteration,  equation  14  is  minimized  over  this  new  active  set.  The 
variance  of  the  Gaussian  distribution  is  gradually  reduced  with  iterations  as  the  estimated  TSF  converges  to  the  true 
TSF.  Experiments  on  synthetic  and  real  data  have  shown  that  our  algorithm  exhibits  good  convergence  properties  and 
does  not  get  stuck  in  local  minima.  We  used  a  /?  value  of  0.5  for  our  experiments. 

2.3.2  Latent  image  estimation 

We  first  perform  image  prediction  at  each  iteration  before  the  TSF  estimation  step  to  obtain  more  accurate  results 
and  to  facilitate  faster  convergence.  The  prediction  step  consists  of  bilateral  filtering,  shock  filtering  and  gradient 
magnitude  thresholding.  The  predicted  image,  denoted  by  1  is  sharper  than  the  estimated  latent  image  1  from  the 
previous  iteration  and  has  fewer  artifacts. 

The  latent  image  1  is  estimated  by  fixing  the  TSF  weights  w.  The  blurring  matrix  is  constructed  using  only  the 
poses  in  the  active  set  since  the  weights  of  the  poses  of  the  inactive  set  are  zero,  i.e.  H  =  A  and  the 

energy  function  to  be  minimized  takes  the  form 

E(l)  =  ||Hl-b|||  +  $1(l)  (15) 

We  use  the  regularization  terms  in  [14]  and  a  conjugate  gradient  approach  for  optimization.  An  overview  of  our 
method  is  given  in  Fig.  3. 

2.4  Results 

We  tested  out  method  on  synthetic  and  real  data.  Fig.  4  shows  a  synthetic  case  where  a  focused  original  image  was 
blurred  using  a  randomly  generated  3D  TSF  to  obtain  the  blurred  image  in  Fig.  4(a).  The  parameters  of  the  TSF 
ranged  as  follows:  9  ranged  between  -1.5  to  1.5  degrees  in  steps  of  0.25,  tx  and  ty  ranged  between  -4  to  4  pixels  in 
steps  of  one  pixel.  The  number  of  non-zero  transformations  in  the  TSF  was  set  to  25.  The  poses  in  the  TSF  are  defined 
in  such  a  way  that  it  depicts  the  path  traversed  by  a  camera  with  non-uniform  velocity.  A  slightly  overestimated  blur 
size  of  -2  to  2  degrees  along  9 ,  -5  to  5  pixels  along  tx  and  ty  was  input  to  our  deblurring  algorithm.  Our  deblurred 
result  (Fig.  4  (b))  is  sharp  and  is  free  from  artifacts. 

The  real  examples  shown  in  the  first  and  second  rows  of  Fig.  5  are  aerial  images  from  VIRAT  dataset.  The  efficacy 
of  our  method  in  deblurring  these  real  images  is  clearly  evident  from  the  artifact  free  and  sharp  output  results. 
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(a)  (b) 

Figure  4:  (a)  Blurred  observation,  (b)  Deblurred  result. 


3  Registration  and  occlusion  detection  in  motion  blur 

Detecting  occluded  regions  in  images  is  an  extensively  studied  problem  in  image  processing  and  computer  vision  due 
to  its  applicability  to  a  vast  range  of  areas  such  as  tracking,  surveillance,  object  recognition,  inpainting  [30],  [31], 
[32],  [33],  [34]  etc.  The  objective,  in  a  typical  setting,  is  to  automatically  detect  occlusion(s)  given  a  pair  of  images 
taken  from  different  view  points  and  at  different  times.  The  occlusions  themselves  may  have  been  caused  by  the 
entry  or  disappearance  of  objects  in  the  scene  within  the  time-span  of  the  two  observations.  A  common  approach 
is  to  first  compensate  for  the  variations  in  pose  by  registering  the  two  images  with  respect  to  each  other  followed 
by  differencing  to  reveal  changes  in  the  scene.  For  small  occlusions,  the  images  can  be  aligned  even  using  standard 
registration  techniques  [35],  [36]  that  do  not  account  for  occlusions.  This  is  because  the  matching  of  unoccluded  pixels 
can  be  expected  to  sufficiently  outweigh  any  possible  degradation  arising  from  attempting  to  match  occluded  pixels 
with  unoccluded  pixels  and  vice  versa.  However,  larger  occlusions  warrant  methods  that  detect  the  occluded  pixels  and 
exclude  them  from  the  registration  process  [37].  This  challenging  problem  of  detecting  occlusions  becomes  even  more 
ill-posed  if  one  of  the  images  in  the  pair  is  blurred  due  to  the  presence  of  camera  shake.  This  is  often  the  case  when 
a  quick  fly-through  is  attempted  for  the  recoverage  of  a  particular  geographic  area,  for  which  detailed  surveillance 
images  (i.e.,  latent  images)  are  already  available.  Moreover,  if  the  revisit  is  made  at  a  time  when  the  luminance  is 
weak  [38],  then  the  exposure  time  needs  to  be  increased,  thereby  increasing  the  chances  of  motion  blur.  Detecting 
occlusions  is  important  for  revealing  changes  in  infrastructure,  deployment  of  military  units,  modification/introduction 
of  equipment  etc.  As  pointed  out  in  [39],  traditional  registration  methods  such  as  direct  and  feature-based  approaches 
cannot  be  used  in  such  a  case  due  to  photometric  inconsistencies  introduced  by  the  blur.  The  alignment  approach 
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(a)  (b) 

Figure  5:  VIRAT  dataset,  (a)  Motion-blurred  real  observations,  (b)  Deblurred  results. 


presented  in  [39]  is  based  on  the  convolution  model  and  applies  only  to  the  restrictive  uniform  blur  case.  However, 
in  the  case  of  general  camera  motion,  the  blur  incurred  can  be  significantly  non-uniform  across  the  image,  and  a 
space-varying  formulation  becomes  necessary  to  describe  the  blurring  process.  It  is  this  compounded  scenario  that 
we  address  in  this  work.  Note  that  there  can  be  more  than  one  occluder. 

While  conventional  approaches  to  detecting  occluder(s)  would  require  one  to  follow  the  deblur-register-difference 
pipeline,  we  present  a  unified  framework  which  directly  solves  for  the  occluder(s)  by  accounting  for  the  non-uniform 
blur  and  the  changes  in  camera  pose  given  a  blurred/unblurred  image  pair.  We  show  that  direct  registration  of  the  pair 
is  possible  without  the  need  for  deblurring.  Registration,  in  this  context,  tantamounts  to  estimating  the  set  of  warps 
which  when  applied  on  the  focused  image  aligns  it  with  the  blurred  image  in  the  region  of  overlap.  The  elegance  of 
our  method  lies  in  the  fact  that  registration  and  occlusion  detection  turn  out  to  be  a  natural  fallout  of  our  blur  esti¬ 
mation  process.  We  assume  that  the  occluded  pixels  occupy  only  a  relatively  small  portion  of  the  image  and  that  the 
camera  motion  trajectory  is  sparse  in  the  camera  motion  space.  We  also  assume  the  scene  to  be  sufficiently  far  away 
so  that  depth  variations  can  be  ignored.  We  use  a  multiscale  approach  in  which  the  image  resolution  is  varied  from 
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coarse-to-fine,  thus  rendering  the  algorithm  efficient  both  in  terms  of  computational  time  and  memory  requirements. 


3.1  Sparsity,  registration  and  occlusion  handling 

If  1,  b  represent  the  latent  image  and  the  blurred  image,  respectively,  lexicographically  ordered  as  vectors,  then,  in 
matrix- vector  notation  we  can  write 


b  =  Acv  (16) 

where  A  is  the  matrix  whose  columns  contain  projectively  transformed  copies  of  1,  and  u  denotes  the  vector  of  weights 
u(k).  Note  that  a?  is  a  sparse  vector  since  the  blur  is  typically  due  to  incidental  camera  shake  and  only  a  small  fraction 
of  the  poses  in  T  will  have  non-zero  weights  in  uj. 

In  the  scenario  that  we  consider,  one  of  the  images  in  the  pair  is  not  only  blurred  because  of  camera  jitter  but  can 
also  contains  occluder(s).  To  deal  with  this  situation,  we  modify  the  linear  model  of  (16)  as 

b0cc  =  b  +  o  (17) 


where  bOCc  is  the  blurred  and  occluded  observation.  In  the  image  formation  model,  the  occlusion  happens  first 
followed  by  blurring,  i.e.,  bocc  is  the  weighted  average  of  warped  instances  of  an  unknown  focused  image  containing 
occlusions.  The  non-zero  entries  of  o,  therefore,  model  the  blurred  occluder(s)  in  bOCc-  Since  the  occluder(s)  can 
have  arbitrary  intensities,  techniques  designed  for  small  noise  cannot  be  used  here.  The  locations  of  occlusion  differ 
for  different  input  images  and  are  not  known  a  priori  to  the  algorithm.  But  we  assume  that  the  occluded  pixels  occupy 
only  a  relatively  small  portion  of  the  image.  Therefore,  the  occlusion  vector  o,  in  the  same  vein  as  the  vector  u;,  has 
sparse  non-zero  entries  [40].  Since  b  =  Au>,  we  rewrite  equation  (17)  as 


bocc  — 


A  I 

uo 

o 

=  Bx 


(18) 


Here  B  —  [A  I]  G  ^Nx(nt+n) ,  where  ]\[  js  total  number  of  pixels  in  the  image,  Nt  is  the  total  number  of 
transformations  in  T,  and  /  is  an  N  x  N  identity  matrix.  Hence,  the  system  bOCc  =  Bx  is  always  underdetermined 
and  does  not  have  a  unique  solution  for  x.  We,  therefore,  attempt  to  recover  x  as  the  sparsest  solution  to  the  system 
bocc  =  T>x.  Note  that  in  the  absence  of  occlusion,  x  =  uj  and  our  problem  reduces  to  the  special  case  of  estimating 
a  sparse  TSF. 
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With  the  occlusion  model  incorporated,  the  energy  function  to  be  minimized  takes  the  form 


£(x)  =  ||bocc--Bx||!  +  /3||x||i  (19) 

s.t  Mk  G  T,o;(fc)  >  0  and  J2ke =  1. 

In  the  absence  of  an  occluder,  the  convex  combination  of  the  elements  of  a  particular  row,  say  i,  of  A  produces 
the  intensity  of  the  blurred  pixel  at  the  ith  location  in  the  image.  If  the  observed  intenstiy  (in  bocc)  at  the  ith  pixel 
is  greater  than  the  maximum  intensity  of  the  elements  of  the  ith  row,  then,  by  convexity,  we  can  deduce  that  it  is 
the  presence  of  a  bright  occluder  that  causes  the  intensity  at  that  pixel  to  increase.  A  positive  value  in  o  will  then 
explain  the  observed  intensity  at  that  pixel.  On  the  other  hand,  if  the  observed  intensity  at  the  ith  pixel  is  less  than 
the  minimum  intensity  of  the  elements  of  the  ith  row,  we  conclude  that  the  occluder  is  dark.  In  this  case,  we  replace 
the  ‘V  at  the  corresponding  location  in  /  with  a  ‘-1’.  This  change  in  sign  permits  us  to  impose  non-negativity  on  x 
because  the  residual  can  now  take  both  positive  and  negative  values.  Thus  B  now  becomes  [A  Imod\  where  Imo(i  is  a 
diagonal  matrix  (with  +1  and  -1  along  the  diagonal)  obtained  after  verifying  the  above  condition. 

3.2  Experiments 

This  section  consists  of  two  parts.  We  first  evaluate  the  performance  of  our  algorithm  on  synthetic  data.  Following 
this,  we  demonstrate  the  applicability  of  the  proposed  method  on  real  images. 

We  begin  with  a  synthetic  example.  A  latent  image  of  size  240  x  240  pixels  of  an  airport  bay  is  shown  in  Fig. 
6(a).  The  same  scene  from  a  different  camera  pose  and  with  synthetically  added  occluders  (enclosed  in  red  boxes) 
is  shown  in  Fig.  6(b).  The  TSF  space  is  chosen  as  follows-  in-plane  translations:  Tx ,  Ty  =  [—7  :  1  :  7],  in-plane 
rotation:  Rz  —  [— 3o  :  lo  :  3o],  out-of-plane  translation:  Tz  =  [0.95  :  0.05  :  1.05]  and  out-of-plane  rotations: 
Rx,  Ry  =  [^o  :  kQ  •  |0]  jQ  sjmujate  the  motion  of  the  camera,  we  manually  generate  6D  camera  motion  with  a 
connected  path  in  the  motion  space  and  initialize  the  weights.  The  synthezied  camera  motion  (TSF  model)  is  applied 
on  Fig.  6(b)  to  produce  the  blurred  and  occluded  image  (Fig.  6(c)).  To  evaluate  the  proposed  method,  we  set  the 
number  of  scales  in  the  multiscale  implementation  to  3  and  first  coarsely  align  the  latent  image  and  the  blurred  and 
occluded  image  at  the  lowest  resolution  without  accounting  for  occlusion.  In  this  step,  the  transformation  intervals  are 
expanded  to  Tx,  Ty  =  [—40  :  1  :  40]  and  Rz  =  [— 8o  :  lo  :  8o]  to  accommodate  for  the  large  change  in  pose  between 
the  two  images.  Note  that  this  increase  in  the  transformation  intervals  is  not  very  demanding  because  we  work  at  the 
lowest  resolution  of  the  image  and  the  TSF  in  the  multiscale  algorithm.  The  ‘dominant  pose’,  i.e.,  the  pose  with  the 
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(a)  (b)  (c)  (d)  (e) 


Figure  6:  (a)  Latent  image,  (b)  latent  image  from  a  different  camera  pose  and  with  synthetically  added  occluders,  (c) 
blurred  and  occluded  observation,  (d)  latent  image  reblurred  using  the  estimated  camera  motion  and  overlaid  on  the 
blurred  and  occluded  observation,  and  (e)  residual  image. 

highest  weight  from  the  estimated  vector  u  is  used  to  align  the  latent  image  with  the  blurred  image.  The  TSF  is  now 
built  around  this  dominant  pose  and  we  minimize  equation  (19)  using  the  multiscale  approach  but  now  by  also  taking 
occlusions  into  consideration.  Fig.  6(d)  shows  the  latent  image  reblurred  using  the  estimated  uj  and  overlaid  on  the 
blurred  and  occluded  observation.  It  is  to  be  noted  that  the  TSF  model  implicitly  accounts  for  the  change  in  pose 
between  the  two  images.  The  residual  image  shown  in  Fig.  6(e)  is  the  absolute  difference  between  the  blurred  and 
occluded  observation  (Fig.  6(c)),  and  the  reblurred  latent  image.  Note  that  the  occluders  are  correctly  detected. 

A  real  example  but  with  an  appreciable  change  in  view  point  is  shown  in  Fig.  7.  A  zoomed-in  view  of  the  blurred 
occluders  (in  this  case  people)  is  shown  in  Fig.  7(c).  The  latent  image  reblurred  using  the  estimated  u:  and  registered 
with  the  blurred  and  occluded  observation  is  shown  in  Fig.  7(d).  The  residual  image  (Fig.  7(e))  reveals  that  the  dark 
occluders  have  been  accurately  detected  by  the  proposed  method.  Fig.  8  depicts  an  example  from  VIRAT  dataset  in 
which,  after  the  process  of  registration,  the  moving  truck  has  been  detected  correctly  as  occlusion  in  Fig.  8(d). 

4  Restoration  of  foggy  and  motion-blurred  scenes 

In  a  medium  such  as  fog,  light  rays  get  attenuated  space-variantly  before  they  reach  the  camera  sensor.  The  scattering 
coefficient  of  such  a  medium  is  high  and  each  ray  gets  attenuated  by  a  multiplicative  factor  that  is  an  exponentially 
decaying  function  of  the  scene  depth  and  scattering  coefficient.  Most  of  the  methods  for  defogging  need  more  infor¬ 
mation  than  just  a  single  image.  In  [41],  multiple  images  of  the  same  scene  under  different  atmospheric  conditions 
were  used.  In  [42],  prior  knowledge  of  the  depth  map  was  used  while  in  [43]  a  polarization  filter  was  employed. 
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Figure  7:  An  aerial  view  of  a  parking  lot.  (a)  Latent  (unblurred)  image,  (b)  blurred  and  occluded  observation  taken 
from  a  different  view  point,  and  (c)  zoomed-in  regions  showing  the  presence  of  non-uniform  blur,  (d)  reblurred  latent 
image  registered  with  the  blurred  and  occluded  observation,  and  (e)  residual  image. 


Figure  8:  VIRAT  dataset,  (a)  Blurred  scene  with  no  activity,  (b)  Blurred  scene  with  activity,  (c)  Image  of  (b)  after 
registration,  (d)  Detected  activity. 
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Recently,  single  image  defogging  techniques  have  been  proposed  in  [44,  45].  A  refined  image  formation  model  was 
used  in  [44]  to  account  for  surface  shading  while  in  [45],  the  dark  channel  prior  was  used. 

We  investigate  the  problem  of  motion  blur  due  to  camera  shake  in  a  foggy  scene  and  mathematically  justify  that 
the  motion-blurred  and  medium-attenuated  scene  radiance  can  be  modelled  using  a  modified  image  formation  model 
in  fog  wherein  the  motion-blurred  radiance  replaces  the  original  scene  radiance.  The  dark  channel  prior  is  then  used 
to  get  a  coarse  depth  map  of  the  scene  along  with  the  blurred  scene  radiance.  For  in-plane  translational  motion-blurred 
images,  the  depth  map  is  directly  used  for  restoration  by  exploiting  the  scaling  relationship  among  the  motion  blur 
kernels  at  different  depths.  For  the  case  of  general  camera  motion,  blur  kernels  are  estimated  at  multiple  patches  and 
they  could  be  at  different  depths.  The  blur  kernel  at  a  point  is  written  as  a  convex  sum  of  weighted  impulses  located  at 
the  displacements  undergone  by  a  pixel  under  the  influence  of  camera  motion.  These  kernels  help  deduce  the  camera 
motion  responsible  for  the  observed  blurring.  While  the  blur  induced  by  rotation  is  depth-invariant,  the  coarse  depth 
map  derived  from  fog  is  used  to  scale  the  translational  motion  by  scene  depth.  Fog  is  also  used  as  a  cue  to  segment 
road  scene  images  into  road,  left,  right  and  sky  planes.  Finally,  knowledge  of  the  camera  motion  is  used  in  conjunction 
with  segmentation  to  deblur  each  of  these  planes.  The  framework  can  be  comfortably  applied  on  aerial  images  too 
wherein  only  the  ground  plane  exists. 

4.1  Image  formation  in  haze  and  blur 

In  the  absence  of  motion  blur,  a  color  image  captured  in  fog  can  be  written  [44,  45]  as 

I(x)  =  t(x)J(x)  +  (1  —  t(x))A  (20) 

where  J  is  the  scene  radiance,  A  (a  constant)  is  the  ambient  light  in  the  atmosphere,  x  is  a  vector  representing  the 
spatial  location  of  a  pixel,  and  t  is  related  to  the  medium  scattering  coefficient  (/ 3 )  and  scene  depth  (d)  through  the 
relation  t(x)  =  exp  (—/3d  (x)).  If  the  camera  underwent  shake  during  exposure,  then  the  captured  image  would  be  a 
blurred  version  of  the  original  image.  We  can  write  this  as 

Te 

B(x)  =  T  J /(x  —  xr)  dr  (21) 

o 
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where  Te  is  the  camera  exposure  time  and  xr  is  the  motion  path  of  pixel  x.  Therefore,  the  blurred  image  captured  in 
fog  can  be  written  in  integral  form  as 

Te 

B(x)  =  Y  J [Tx  —  xT)  exp(— /3c1x_Xt) 

0 

+A(1  -  exp (-/3dx_xJ)]  dr 

Since  neighboring  points  x  and  (x  —  xr)  will  be  approximately  at  the  same  depth,  we  can  write 

Te 

B(x)  =  ^exp(-/3dx_Xr)  J  J (x  xT)  dr 

o 

+(1  -  exp  ( — /3dx_Xr ) )  A 

It  is  interesting  to  observe  that  the  above  equation  is  very  similar  in  form  to  the  fog  image  formation  model 

Te 

(equation  (20))  except  that  J  has  been  replaced  by  the  term  ^  f  J(x  —  xr )  dr,  which  is,  in  fact,  the  blurred  radiance 

e  o 

of  the  scene  captured  in  a  lossless  medium.  Consequently,  we  can  write  the  modified  image  formation  model  taking 
both  blur  and  fog  into  account  as 


I(x)  =  t(x)JB(x)  +  (1  -  t(x))A  (22) 

where  Jr  =  /QTe  J(x  —  xr)  dr.  The  implication  of  this  result  is  that  a  coarse  depth  map  and  blurred  radiance  can 
be  obtained  using  dark  channel  prior  [45]  akin  to  the  case  of  a  pure  foggy  image.  We  express  the  dark  channel  of  the 
motion  blurred  radiance  as 

Jb darki*-)  =  min  min(JBc(s))  (23) 

sGcj(x)  c 

where  the  color-channel  is  c  and  cc(x)  is  a  small  neighborhood  around  pixel  x.  As  discussed  in  [45],  the  principle 
behind  a  dark  channel  prior  is  that  one  of  the  color  channels  in  the  proximity  of  a  point  is  close  to  zero  for  outdoor 
scenes.  Since  JB  is,  in  fact,  a  blurred  version  of  the  scene  radiance,  it  stands  to  reason  that  J Bdarki*-)  will  be  close 
to  zero.  Consequently,  equation  (22)  yields  t(x)  as 

t(x)  =  1  —  min  (min  (  ^  )  )  (24) 

s^(x)  \  c  \  Ac  J  J 

A  scaled  depth  of  the  scene  can  thus  be  inferred  using  the  relationship  between  transmission  and  depth.  Since  the 
airlight  can  be  computed  easily,  we  can  obtain  JB  from  equation  (22).  Because  of  the  min  operation,  blocky  artifacts 
can  appear  in  the  depth  map  as  well  as  in  the  recovered  scene  radiance.  These  can,  however,  be  removed  by  using 
closed-form  matting  followed  by  bilateral  filtering. 
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4.2  Motion  Deblurring  in  fog/haze 

In  this  section,  we  first  discuss  deblurring  under  pure  translations  and  then  examine  the  case  of  general  camera  motion. 
The  blurred  scene  radiance  and  depth  map  are  obtained  as  discussed  in  the  earlier  section. 


4.2.1  Translational  motion 

The  motion  blur  kernel  at  a  point  x  can  be  written  as 


1  fTe 

fi(x;  s)  =  —  /  (5(s-x(x,r))  dr 

1e  JO 


(25) 


where  Te  is  the  camera  exposure  time  while  x  is  the  motion  path  of  pixel  x  as  a  function  of  r.  For  pure  camera 
translations,  the  blur  kernels  at  different  pixels  on  the  image  are  related  by  their  relative  depths.  If  do  is  the  scene 
depth  of  a  pixel  where  the  translational  motion  blur  kernel  is  /io(s),  then  the  blur  kernel  at  any  other  point  x  that  is  at 
a  depth  d(x)  is 


/i(x;  s)  =  fc2(x)/io(fc(x)s) 


(26) 


where  fc(x)  is  the  ratio  Using  this  relationship,  we  can  obtain  the  blur  kernel  at  any  point  in  the  image  from  a 
reference  kernel  and  the  relative  depth  map.  The  blurred  radiance  can  be  expressed  as 


J'b  =  Cl'  (27) 

where  the  matrix  C  consists  of  blur  kernels  at  every  pixel  lexicographically  ordered  in  each  row,  the  vector  J'B  is 
obtained  by  lexicographically  ordering  Jb,  while  the  original  image  is  represented  as  column  vector  I'.  The  above 
equation  is  solved  for  V  using  conjugate  gradient  squared  method  since  the  blur  matrix  C  is  asymmetric.  A  look-up 
table  is  constructed  for  discrete  steps  of  depth  values  and  the  matrix  multiplication  C  is  equivalently  done  with  a 
pixel-wise  blur  operation.  Thus,  the  original  (deblurred)  radiance  can  be  recovered. 

4.2.2  General  camera  motion 

Deblurring  in  the  presence  of  general  camera  motion  is  more  involved  since  the  blur  kernels  at  different  locations  are 
no  longer  related  by  just  a  scale  factor.  In  our  approach,  we  first  use  fog  itself  as  a  cue  to  segment  the  scene  into  road, 
left,  right  and  sky  planes  using  the  planar  road  constraint  [46].  Following  this,  the  motion  blur  induced  on  each  of 
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these  planes  is  inferred  based  on  the  estimated  camera  motion.  Finally,  each  of  these  planes  is  deblurred  using  the 
projective  motion-blur  model.  For  aerial  imagery,  only  the  ground  plane  exists. 

4.2.3  Scene  segmentation 

For  road  scenes,  under  the  planar  assumption  [46],  an  inverse  relationship  exists  between  the  image  height  (i.e.,  the 
row  number)  and  the  scene  depth  in  the  road  region.  For  a  3 D  point  (X,  Y,  Z),  the  X  and  Y  co-ordinates  would  be 
constant  for  all  the  points  on  the  road  (assuming  a  flat  planar  surface  for  the  road).  Thus,  we  can  write  t  as  a  function 
of  y  as 


t(x)  =  exp  (— /3d(x))  =  exp  (  —  /3^- 

\  y 

From  equation  (20),  we  get  the  original  scene  radiance  as 

I  —  (1  —  t)  A 


(28) 


(29) 


Thus,  note  that  for  image  points  that  do  not  correspond  to  the  road,  t  will  be  underestimated  by  equation  (28)  and 
this  will  lead  to  negative  values  for  J  since  (1  —  t)  will  be  high.  We  make  all  the  positive- valued  pixels  obtained 
from  equation  (29)  equal  to  1  and  the  rest  to  0  to  get  a  binary  image.  We  use  median  filtering  to  smoothen  this  image. 
The  result  is  a  segmentation  of  the  scene  into  road,  left,  right  and  sky  planes.  Approximate  distances  to  each  of  these 
planes  are  assumed  to  be  known  apriori. 


(a)  (b) 

Figure  9:  (a)  Foggy  road  image,  (b)  Segmentation  result. 


The  segmentation  result  obtained  for  the  foggy  road  image  of  Fig.  9(a)  is  shown  in  Fig.  9(b).  The  white  pixels 
depict  the  road  and  sky  regions  in  Fig.  9(b). 

As  discussed  earlier,  a  planar  homography  relates  any  two  views  of  the  same  planar  scene  and  is  given  [47,  48]  by 

H  =  K  (  R  + 


K 


-l 


(30) 
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where  K  is  the  camera  calibration  matrix,  R  is  the  3 D  rotation  matrix,  T  is  the  3 D  translation  vector,  while  n  and  d 
are  the  plane  normal  and  perpendicular  distance  from  the  camera  center,  respectively.  A  motion  blurred  image  of  an 
arbitrary  plane  can  be  written  as 


B(x)  =  J>t(A)I(Hax) 

A 

where  Ha  is  the  homography  matrix  that  relates  the  view  A  to  the  reference  view  and  hx(A)  is  the  fraction  of  time 
that  the  camera  spends  in  view  A  during  the  exposure  time. 

The  cross-section  of  a  plane  at  a  particular  depth  Z  —  c  can  be  written  as 


n\X  +  n^Y  +  713c  =  d 


The  projection  of  any  point  ^a,  d~ni“~n3C ,  cj  on  this  line  onto  the  image  plane  would  be  x  =  ^  and  y  =  d~n^a2~n3C  • 
For  pure  in-plane  translation  and  rotation  of  the  camera,  assuming  the  focal  length  to  be  unity  (for  simplicity),  the 
plane  image  point  would  move  to  the  position  given  by  a  2D  transformation 


COS (0Z) 

-  sin(<92) 

a 

c 

+ 

"zk" 

c 

_sin  (0Z) 

cos (9Z)  _ 

d—nia—nsc 

Tv_ 

ri2C 

_  c  _ 

The  displacement  of  a  small  patch  lying  on  the  plane  would  be  given  by  the  above  relation  (31). 
The  point  spread  function  at  a  point  x  can  be  related  to  the  camera  motion  (hx(A))  as 


(31) 


/i(x;  s)  =  J  hx(A)5(s  —  (x  —  Hax))  dA 

Using  this  relationship,  we  can  write  a  matrix  equation  that  relates  blur  kernels  to  camera  motion  as 


h  Mhx 


(32) 


where  h  is  a  column  vector  of  blur  kernels  lexicographically  ordered  and  M  is  a  matrix  that  holds  the  value  1  at 
rows  corresponding  to  the  displacement  of  a  patch  under  camera  transformation  Ha-  The  depth  d  is  factored  into  the 
computations  using  the  coarse  depth  map  estimated  from  equation  (24).  Following  [9],  we  consider  only  the  three 
parameters  ( tx ,  ty  and  0Z)  when  estimating  camera  motion.  Equation  (32)  is  solved  with  LI  norm  sparsity  constraint 
on  hx- 


21 


(a)  (b)  (c) 


Figure  10:  (a)  Foggy  and  motion-blurred  road  image,  (b)  Defogged  image,  (c)  Defogged  and  deblurred  result. 

4.2.4  Motion  deblurring  of  planes 

The  projective  motion  Richardson-Lucy  algorithm  of  Tai  et  al.  [6]  deblurs  the  image  of  a  fronto-parallel  scene  under 
known  camera  motion.  We  can  deblur  an  arbitrary  plane  after  estimating  the  camera  motion  using  the  technique  in  the 
previous  section.  The  likelihood  probability  P( B,  hx|I)  is  modeled  using  Poisson  distribution  and  the  Kuhn-Tucker 
conditions  are  used  to  arrive  at  the  iterative  update  equation  for  the  original  radiance  given  the  camera  motion  hx  as 

In+1  =  In  hT(A)En(H^x)  (33) 

A 

where  En  is  an  error  matrix  given  by  E71  =  where  the  division  is  point- wise.  Note  that  B/n  is  the  blurred  scene 
radiance  of  the  plane  under  consideration  and  can  be  obtained  from  In  as 

=  J>T(A)P(tfAx)  (34) 

A 

In  addition,  the  total  variation  prior  [6]  is  used  to  preserve  edges. 

4.3  Experimental  results 

In  this  section,  we  give  representative  results  for  restoring  foggy  and  motion  blurred  images  in  real  scenarios. 

We  first  examine  a  road  scene  image  that  had  incidental  motion  blur  which  was  non-uniform.  This  image  was  taken 
outside  our  department  with  a  mobile  phone  camera.  Since  it  was  raining,  the  collective  appearance  of  raindrops  made 
the  scene  appear  foggy  (Fig.  10(a)).  The  blurred  scene  radiance  and  coarse  depth  map  were  obtained  from  equation 
(22).  We  computed  blur  kernels  at  four  different  locations  in  the  image.  Using  the  depth  map  and  these  blur  kernels, 
the  camera  motion  was  estimated  using  equation  (32).  The  foggy  images  were  segmented  into  road,  left,  right  and  sky 
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planes  using  the  planar  road  constraint.  Each  of  these  planes  was  then  deblurred  using  equation  (33).  The  defogged 
and  deblurred  result  is  shown  in  Fig.  10(c)  wherein  we  can  observe  the  improvement  that  accrues  due  to  deblurring. 

The  first  row  of  Fig.  1 1  depicts  deblurring  as  well  as  dehazing  of  a  real  aerial  image  from  VIRAT  dataset.  Note  the 
improvements  in  the  restored  image  over  the  observation.  We  also  show  results  on  another  real  aerial  image  (second 
row  of  Fig.  11)  kindly  provided  by  Dr.  Steve  Sudarth  from  Transparent  Sky,  USA.  The  appearance  after  deblurring 
and  dehazing  is  quite  striking. 


(a)  (b) 


Figure  11:  VIRAT  dataset  (first  row)  and  another  real  aerial  image  (second  row),  (a)  Hazy  and  motion-blurred  image, 
(b)  Dehazed  and  deblurred  result. 

5  Multi-frame  blind  super-resolution  of  non-uniformly  blurred  images 

The  effective  resolution  of  an  imaging  system  is  limited  not  only  by  the  physical  resolution  of  its  image  sensor  but  also 
by  blur.  If  the  blur  is  present,  super-resolution  makes  little  sense  without  removing  the  blur.  The  presence  of  a  spatially 
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varying  blur  makes  the  problem  much  more  challenging  and  for  the  present,  there  are  almost  no  algorithms  designed 
specifically  for  this  case.  The  critical  part  of  such  algorithms  is  precise  estimation  of  the  varying  blur,  which  depends 
to  a  large  extent  on  a  specific  application  and  type  of  blur.  However,  non-uniform  deblurring  and  super-resolution, 
though  two  extensively  studied  topics,  have  mostly  been  dealt  with  independently.  The  importance  of  deriving  a  high- 
resolution  image  from  low-resolution  observations  is  very  well-known  and  is  an  active  area  of  research.  However, 
what  is  not  obvious  is  the  fact  that  a  motion  blurred  image  can  be  used  to  perform  SR.  This  is  due  to  the  fact  that 
the  motion  blurred  image  of  a  planar  scene  stems  from  the  weighted  average  of  a  sequence  of  geometric  warps  of  the 
original  scene. 

A  large  number  of  papers  address  the  standard  SR  problem  when  the  images  are  not  blurred.  A  good  survey 
can  be  found  for  example  in  [49]  and  [50].  Maximum  likelihood,  maximum  a  posteriori  (MAP),  the  set  thoeretic 
approach  using  projection  on  convex  sets,  and  fast  Fourier  techniques  can  all  provide  a  solution  to  the  SR  problem. 
Spatial-domain  SR  approaches  are  popular  as  they  can  accommodate  complex  priors  and  can  handle  even  non-global 
motion  [49].  A  multi-frame  SR  method  to  detect  and  reconstruct  small  rigid  moving  objects  with  translucent  pixels 
is  elaborated  in  [51].  Movements  between  adjacent  frames  are  generally  assumed  to  be  smooth  [52],  but  due  to 
object  and/or  camera  motion,  the  shifts  in  the  frames  can  be  space-variant.  Brox  and  Malik  [53]  employ  an  optical 
flow  method  which  is  well-suited  for  large  local  motion.  However,  state-of-the-art  SR  techniques  achieve  remarkable 
results  of  resolution  enhancement  only  in  the  case  of  no  blur. 

Sroubek  and  Cristbal  [54]  propose  a  unifying  method  that  simultaneously  estimates  the  camera  motion  and  the 
HR  image  from  a  set  of  blurred  LR  observations  without  any  prior  knowledge  of  the  blurs  and  the  original  image. 
Harmeling  et  al.  [55]  solve  the  same  multi-frame  super-resolution  problem  using  an  incremental  expectation  max¬ 
imization  (EM)  framework  that  does  not  require  explicit  image  or  blur  priors.  But  these  approaches  are  based  on 
the  standard  convolution  model  and  is  restricted  to  translational  motion.  They  do  not  tackle  the  more  real  case  of 
space-varying  blur.  A  naive  approach  to  tackling  this  scenario  includes  applying  super-resolution  using  the  convolu¬ 
tion  model  to  small  space-invariant  regions  in  the  image  and  then  sewing  up  the  patches.  Unfortunately,  it  is  not  easy 
to  sew  the  patches  together  without  artifacts  on  the  seams.  An  alternative  way  is  first  to  use  the  estimated  PSFs  to 
approximate  the  spatially  varying  PSF  by  interpolation  of  adjacent  kernels  and  then  compute  the  image  of  improved 
resolution.  The  main  problem  of  these  naive  procedures  is  that  they  are  relatively  slow,  especially  if  applied  on  too 
many  positions.  We  present  a  new  joint  approach  to  the  super-resolution  and  non-uniform  deblurring  problem.  We 
use  HR  PSFs  estimated  at  a  few  locations  in  the  LR  frames  to  compute  the  HR  TSFs  which  reveal  the  camera  motion 
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during  exposure  for  the  latent  HR  image.  The  restoration  step  is  finally  carried  out  using  suitable  regularization  terms 
on  the  image. 


5.1  The  SR  motion  blur  model 


In  the  discrete  domain,  the  operation  of  blurring  and  downsampling  can  be  represented  by  the  following  equation. 


=  D 


/  Nj1  N 

X><om  (®>j)) 


(35) 


<1  =  1 


Here  /(i,  j)  denotes  the  latent  HR  image  of  the  scene.  g(i,j)  is  the  blurred  LR  image.  Hi(i,j)  denotes  the  image 
coordinates  when  a  homography  Hi  is  applied  on  the  point  (When  Hi{i,  j )  takes  non-integer  values,  we  assign 
values  to  the  pixels  neighboring  Hi(i,j)  by  bilinear  interpolation  principle.)  D  is  the  downsampling  operator  or  the 
decimation  operator  that  models  the  function  of  the  CCD  sensors.  A  multi-frame  method  that  uses  the  information 
present  in  adjacent  frames  is  used  to  increase  the  spatial  resolution  of  the  super-resolved  image.  If  we  have  K  LR 
frames,  then  we  can  rewrite  equation  (35)  as  follows 


/  nt 

9k  ( i,j )  =  D  (  (!)  f  C Hik 

\i= l 


(i,j))  where  k  =  1,2 


(36) 


where  gk  (i,  j )  is  the  kth  LR  observation  and  and  Hik  are  the  associated  TSFs  and  homographies. 


5.2  The  proposed  method 

Consider  K  blurred  LR  observations  °fa  scene  which  are  related  to  the  latent  HR  image  /  through  the 

HR  TSFs  (jj\ ,  0J2 5 wk-  The  objective  is  to  estimate  /  from  gq,  <72?  •••,  Qk-  To  compute  the  HR  TSFs,  we  re-formulate 
the  TSF  estimation  technique  in  [56]  to  the  SR  scenario  by  using  HR  blur  kernels  estimated  at  a  few  points  in  the  LR 
frames.  With  the  knowledge  of  the  HR  TSFs,  we  solve  for  the  latent  HR  image  /  within  a  regularization  framework. 
The  details  of  these  steps  are  explained  in  the  following  subsections. 


5.2.1  HR  PSF  estimation 

Our  first  step  is  to  estimate  HR  blur  kernels  at  different  locations  in  the  LR  frames.  To  this  end,  we  use  [57]  to 
determine  regions  in  the  image  with  good  texture  and  long  edges  that  are  suitable  for  blur  kernel  estimation.  From  this 
subset,  we  select  Np  point  locations,  and  blurred  image  patches  from  the  K  LR  observations  (denoted  by  g{gi---giP, 
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g\g\...g^v are  cropped  around  these  points.  Within  each  set  of  patches  (g\,g\,  ...,gzK),  we  assume 
the  blur  to  be  space-invariant  and  use  the  blind  SR  technique  in  [54]  to  get  the  HR  blur  kernels  h\,  hl2l hlK.  The 
method  in  [54]  requires  that  the  number  of  observations  K  needed  to  estimate  the  HR  blur  kernels  should  be  greater 
than  the  SR  factor  (e)  squared  i.e.,  K  >  e2.  We  found  in  our  experiments  that  the  estimates  of  the  blur  kernels  from 
the  method  in  [54]  are  quite  accurate. 

5.2.2  HR  TSF  from  HR  PSFs 

Our  next  objective  is  to  estimate  the  HR  TSF  cjk  that  concurs  with  the  Np  observed  HR  blur  kernels  hkhk...hkp  for 
k  =  1, 2,  ...K.  Note  that  the  TSF  uok  G  RNt  will  be  a  sparse  vector  in  practice  because  the  camera  motion  during 
exposure  would  result  in  very  few  transformations  out  of  all  the  possible  elements  of  T.  Hence  while  solving  for 
we  impose  a  sparsity  constraint  for  regularization.  Following  [56],  we  express  the  blur  kernel  hlk  as  h\  =  Mk uok  for 
i  =  1, 2,  ...Np  since  each  component  of  the  blur  kernel  hlk  at  a  given  pixel  is  a  weighted  sum  of  the  components  of 
the  TSF  6C&.  Here,  Mk  is  a  matrix  whose  entries  are  determined  by  the  location  of  the  blur  kernel  and  the  bilinear 
interpolation  coefficients.  Note  that  the  Np  point  locations  were  chosen  on  the  LR  grid  and  patches  were  cropped 
around  these  points  from  the  LR  frames.  Since  the  TSFs  are  being  estimated  on  an  HR  grid,  the  Np  point  locations 
should  be  scaled  by  the  SR  factor  e,  and  our  TSF  estimation  step  differs  from  the  method  proposed  in  [56]  in  this 
important  respect.  If  the  number  of  elements  in  the  blur  kernel  is  Nh,  then  the  size  of  the  matrix  Mk  will  be  N \  x  Nt- 
By  stacking  all  the  Np  blur  kernels  as  a  vector  hk,  and  suitably  concatenating  the  matrices  Mk  for  i  =  1,2, ...,  Np, 
the  HR  blur  kernels  can  be  related  to  the  HR  TSF  as 

hk  =  MkuJk  (37) 

The  matrix  Mk  is  of  size  NpNk  x  Nt-  To  get  an  estimate  of  the  HR  TSF  that  is  consistent  with  the  observed  HR  blur 
kernels  and  is  sparse  as  well,  we  minimize  the  following  cost 

argmin{||/ifc  -  Mkuk |||  +  A||wfc||i}  (38) 

where  the  positive  scalar  A  controls  the  extent  of  sparsity  of  the  estimated  HR  TSF.  We  estimate  each  HR  TSF 
cji,  CC2, oo k  separately  by  minimizing  the  above  cost  function  using  the  toolbox  in  [29]. 
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5.2.3  Deblurring 


Equation  (36)  can  be  expressed  in  the  matrix- vector  notation  [54]  as 

9k  =  DHkf  (39) 

where  and  D  are  the  matrices  that  perform  the  non-uniform  blurring  and  downsampling  operations  respectively. 

Once  uji ,  Cc?2 5 oo k  are  known,  we  formulate  the  energy  function  based  on  the  observation  error  and  a  regulariza¬ 
tion  term  as 

K 

E(f)  =  J2\\DHkf-gk\\22  +  afTLf  (40) 

k= 1 

where  L  is  the  discrete  form  of  the  variational  prior  and  is  a  positive  semidefinite  block  tridiagonal  matrix  [54] 
constructed  of  values  depending  on  the  gradient  of  /.  The  rationale  behind  the  choice  of  this  prior  is  to  constrain  the 
local  spatial  behaviour  of  the  images.  While  in  smooth  areas,  it  has  the  same  isotropic  behaviour  as  the  Laplacian,  it 
also  preserves  edges.  The  disadvantage  is  that  it  is  highly  non-linear  and  the  half-quadratic  algorithm  has  to  be  used 
to  minimize  (40).  a  represents  the  weight  of  the  image  smoothness  term. 

|/  =  0^  (YJHlDTDHk  +  al\f  =  YJHlDTgk  (41) 

\k= 1  /  k= 1 

Ht  is  equivalent  to  blurring  through  a  TSF  except  for  the  fact  the  warping  to  be  applied  is  H^1  instead  of  Hi  [56]. 
The  matrix  DT  spreads  equally  the  intensity  in  LR  to  e2  pixels  in  HR. 

We  use  the  method  of  conjugate  gradients  to  solve  (41)  and  then  adjust  the  solution  /  to  contain  values  in  the 
admissible  range,  typically,  the  range  of  values  of  g & 

5.3  Experimental  results 

We  begin  with  a  synthetic  example.  The  image  of  a  tablecloth  (of  size  530  x  640  pixels)  shown  in  Fig.  12(a)  was 
used  as  the  latent  HR  image  of  the  scene.  To  simulate  the  shake  incurred  by  a  camera  during  exposure,  we  manually 
generated  camera  motion  with  a  connected  path  in  the  motion  space  and  initialized  the  weights.  Five  different  HR 
TSFs  were  synthesized  thus  and  the  five  corresponding  blurred  LR  frames  were  generated  from  the  latent  image  by 
first  blurring  (using  the  TSF  model)  and  then  downsampling  by  a  factor  of  two.  For  an  SR  factor  of  2,  the  method  in 
[54]  needs  a  minimum  of  five  LR  images  for  the  computation  of  the  HR  PSF.  The  parameters  of  the  TSF  ranged  as 
follows:  6  ranged  between  -2  to  2  degrees  in  steps  of  0.5,  tx  and  ty  ranged  between  -3  to  3  pixels  in  steps  of  one  pixel. 
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Figure  12:  (a)  Latent  HR  image,  (b)  blurred  LR  observations,  (c)  Sroubek  et  al  [54],  (d)  upsampled  output  of 
Paramanand  et  al  [56],  (e)  upsampled  output  of  Whyte  et  al.  [10],  and  (f)  our  output.  Row  3:  Zoomed-in  regions 
from  the  blurred  LR  observations.  Rows  4,5,6:  Zoomed-in  regions  from  the  original  image  (first  column),  Sroubek  et 
al  [54]  (second  column),  upsampled  output  of  Paramanand  et  al  [56]  (third  column),  upsampled  output  of  Whyte  et 
al.  [10]  (fourth  column)  and  our  output  (fifth  column). 
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Figure  13:  Real  experiment  1:  (a)  Blurred  LR  observations,  (b)  our  output,  (c),  (d)  zoomed-in  regions  from  (a)  and 
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We  selected  four  spatially  separated  image  patches  having  good  texture  from  the  265  x  320  LR  observations 
and  used  the  algorithm  in  [54]  to  determine  the  HR  blur  kernels  corresponding  to  these  patches.  The  blurred  LR 
observations  with  the  selected  patches  (enclosed  in  white  boxes)  are  shown  in  Fig.  12(b).  The  HR  TSFs  were  then 
computed  from  these  HR  blur  kernels  using  the  method  described  in  section  5.2.  The  deblurred  HR  image  is  shown 
in  Fig.  12(f).  The  output  obtained  by  using  the  convolution  model  in  [54]  is  shown  in  Fig.  12(c).  For  comparison,  we 
also  upsampled  by  a  factor  two  the  LR  output  image  of  [56]  using  bicubic  interpolation.  This  is  shown  in  Fig.  12(d).  It 
is  to  be  noted  that  all  five  LR  observations  were  given  as  input  to  the  algorithm  in  [56].  Another  comparision,  obtained 
by  first  deblurring  one  of  the  LR  frames  using  the  technique  in  [10]  and  then  upsampling  the  result,  is  provided  in  Fig. 
12(e).  In  row  3  of  Fig.  12,  we  show  three  sets  of  zoomed-in  regions  from  the  five  blurred  LR  observations.  Zoomed- 
in  patches  from  the  original  image  (first  column),  output  of  [54]  (second  column),  upsampled  output  of  [56]  (third 
column),  upsampled  output  of  [10]  (fourth  column)  and  our  output  (fifth  column)  are  shown  in  rows  4,5  and  6.  We 
observe  that  our  output  is  sharper  and  compares  more  closely  to  the  original  than  all  the  other  methods.  Fig.  13  depicts 
super-resolution  results  for  a  real  case.  Note  the  improvement  in  readability  of  text  that  accrues  after  super-resolution. 

6  Conclusions,  highlights  and  future  directions 

In  this  report,  we  investigated  the  problem  of  processing  images  and  videos  in  the  presence  of  spatially-varying  motion 
blur.  Intermediate  results  were  produced  regularly  and  information  exchanged  between  PI  and  Dr.  Guna  from  AFRL 
to  facilitate  interaction  and  forward  movement. 

Key  accomplishments 

Analytical  expressions  were  derived  to  relate  the  degree  of  blur  to  the  nature  of  the  scene  and  camera  motion,  and 
robust  methods  were  proposed  for  image  deblurring. 

•  Based  on  the  notion  of  a  global  transformation  spread  function,  we  propose  a  formulation  that  can  effectively 
restore  space-variant  motion  blurred  images  affected  by  arbitrarily -shaped  blur  kernels. 

Our  efforts  will  be  valuable  in  mitigating  the  effects  of  motion  blur  in  practical  situations. 

Following  this,  we  addressed  the  related  problem  of  automatically  detecting  occluded  regions  given  a  pair  of 
images  of  a  scene  taken  from  different  viewpoints.  The  occlusion  could  be  due  to  single  or  multiple  objects.  What 
makes  this  problem  difficult  for  surveillance  applications  is  the  fact  that  the  image  pair  is  both  geometrically  and 
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photometrically  out-of-sync. 


•  We  have  presented  a  unified  framework  based  on  sparsity  prior  for  automatically  detecting  occluder(s).  The 
method  is  reasonably  robust  to  non-uniform  motion  blur  as  well  as  variations  in  camera  pose  ( without  the  need 
for  deblurring). 

Given  the  distances  involved  in  aerial  imagery  and  the  fact  that  it  is  not  uncommon  to  use  plexiglass  for  lens  protection, 
the  problems  of  deblurring  and  dehazing  often  co-occur. 

•  We  have  expanded  the  scope  of  the  restoration  problem  to  not  only  include  deblurring  but  also  dehazing  with 
the  aim  of  improving  loss  of  visibility  due  to  poor  contrast. 

Finally,  we  addressed  the  problem  of  estimating  the  latent  high-resolution  (HR)  image  from  a  set  of  non-uniformly 
blurred  low-resolution  (LR)  images. 

•  We  have  presented  a  new  framework  that  judiciously  exploits  sub-pixel  motion  arising  from  motion  blur  to 
perform  super-resolution  of  images. 

The  HR  TSF,  which  reveals  the  camera  motion  during  exposure  for  the  latent  HR  image,  is  computed  from  the  LR 
frames. 

Each  of  the  above-mentioned  efforts  was  comprehensively  analyzed  and  validated  with  synthetic  as  well  as  real 
data. 

•  While  some  of  our  new  findings  have  already  been  published  in  peer-reviewed  avenues,  some  more  are  under 
review  in  prestigious  international  journals. 

Expected  impact 

The  related  problems  of  image  restoration,  registration,  dehazing,  and  superresolution,  all  in  the  presence  of  blurring 
due  to  camera  motion,  that  have  been  investigated  within  the  ambit  of  this  proposal  stand  at  the  cutting  edge  of 
research.  The  works  undertaken  in  this  proposal  will  go  a  long  way  in  providing  much-needed  theoretical  insights  and 
a  strong  mathematical  underpinning  into  the  relatively  less-understood  but  all-prevalent  motion  blur.  The  potential 
ramifications  of  the  work  undertaken  are  many,  some  of  which  are  discussed  next. 

A  sharp  image  is  beneficial  not  only  from  the  perspective  of  visual  appeal  but  also  because  it  forms  the  basis  for 
moving  object  tracking,  change  detection,  robust  feature  extraction  etc.  There  is  another  novel  spin  to  this  problem. 
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Because  the  motion  blur  results  from  platform  motion,  knowledge  of  motion  blur  can  prove  to  be  a  valuable  cue  in 
stabilizing  the  captured  video.  Our  work  on  registration  can  form  the  basis  for  handling  view-point  changes  as  these 
can  be  captured  through  poses  of  the  camera  as  part  of  motion  blur  estimation  and  the  intensities  can  be  compared 
through  weighted  geometric  warps  of  the  images.  The  twin  problems  of  deblurring  and  dehazing  open  up  exciting 
avenues  for  future  research.  If  this  capability  can  be  achieved  on  board  and  in  real-time,  it  can  be  very  valuable  for 
applications  involving  aerial  surveillance.  Our  work  on  superresolution  will  be  especially  valuable  while  attempting 
to  track  targets  that  occupy  just  a  few  pixels.  A  possible  extension  to  our  work  will  be  to  allow  for  depth  variations  in 
the  scene. 

Future  directions 

The  theories  proposed  within  the  ambit  of  this  proposal  are  not  limited  to  aerial  images  and  can  potentially  be  used  in 
any  situation  in  which  the  scene  can  be  modeled  as  approximately  planar.  This  could  include  even  human  faces,  for 
example.  The  importance  of  face  recognition  cannot  be  underestimated  for  homeland  security.  Current  works  on  face 
recognition  assume  that  the  face  is  reasonably  focused  or  at  worst  suffers  from  uniform  motion  blur.  Our  work  can 
help  generalize  the  theory  of  face  recognition  to  non-uniform  blurring  situations  too.  The  future  of  face  recognition 
research  is  moving  towards  recognition  of  subjects  in  motion  and  our  work  will  be  critical  to  this  futuristic  scenario. 

Although  our  focus  is  on  camera  motion,  the  entire  theory  is  also  applicable  to  the  case  when  there  is  a  single 
moving  object  relative  to  the  camera.  An  interesting  off-shoot  to  this  problem  arises  when  you  have  moving  cam¬ 
era  as  well  as  moving  targets.  What  is  fascinating  is  that  the  motion  blur  can  potentially  be  used  for  segmenting 
independently  moving  objects  in  a  scene.  This  is  a  very  fertile  area  and  there  is  a  lot  of  excitement  surrounding  it. 
We  believe  that  the  natural  connect  that  exists  between  motion  blur  and  alpha-mattes  should  be  exploited  to  segment 
dynamic  scenes.  As  a  related  spin-off  to  the  motion  deblurring  problem,  it  should  be  possible  to  exploit  the  proposed 
framework  to  fork  into  the  exciting  area  of  image  forensics  to  even  detect  splicing  in  images. 

We  shall  continue  our  investigations  into  the  problems  of  image  registration,  occlusion  detection,  and  super¬ 
resolution,  especially  from  the  viewpoint  of  low-rank,  sparse  error  matrix  decomposition.  It  is  envisioned  that  this 
will  lead  to  a  robust  analytical  and  computational  framework  that  can  be  exploited  to  address  practical  scenarios.  In 
addition,  we  shall  also  focus  on  the  recovery  of  3D  information  of  scenes  imaged  by  a  moving  camera.  Here,  the 
motion  blur  will  be  harnessed  as  a  cue  for  depth  since  the  extent  of  motion  blur  at  an  image  point  is  dictated  both  by 
scene  structure  and  camera  motion.  We  shall  also  investigate  the  related  spin-off  problems  of  splicing  detection  in  the 
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exciting  domain  of  image  forensics. 


Summary  statement 

The  works  carried  out  under  this  proposal  can  help  foster  excellence  in  basic  research,  create  new  scientific  under¬ 
standing,  and  make  available  unforeseen  and  innovative  technological  options  for  the  scientific  community.  They  can 
revolutionalize  and  profoundly  impact  the  future  capabilities  of  the  AFOSR/AFRL  in  their  ability  to  harness  valu¬ 
able  information  from  captured  image  data  giving  them  a  distinct  technological  edge  in  the  air  to  meet  the  growing 
challenges  of  the  future. 
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Proc.  IEEE  International  Conference  on  Image  Processing  (ICIP)  Melbourne,  Sep.  2013. 

2.  T.  Veeramani,  A.  N.  Rajagopalan  and  Guna  Seetharaman.  Restoration  of  foggy  and  motion-blurred  road  scenes, 
in  Proc.  IEEE  International  Conference  on  Image  Processing  (ICIP)  Melbourne,  Sep.  2013. 

3.  P.  Rao,  A.N.  Rajagopalan  and  Guna  Seetharaman.  Harnessing  motion  blur  to  unveil  splicing.  IEEE  Transac¬ 
tions  on  Information  Forensics  and  Security  (under  review,  revised  version  submitted). 

4.  T.  Veeramani,  A.  N.  Rajagopalan  and  Guna  Seetharaman.  Restoration  of  foggy  and  motion  blurred  images. 
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